Calculating the probability of multitaxon evolutionary trees :

نویسنده

  • JAMES A. LAKE
چکیده

The reconstruction of multitaxon trees from molecular sequences is confounded by the variety of algorithms and criteria used to evaluate trees, making it difficult to compare the results of different analyses. A global method of multitaxon phylogenetic reconstruction described here, Bootstrappers Gambit, can be used with any four-taxon algorithm, including distance, maximum likelihood, and parsimony methods. It incorporates a Bayesian-Jeffreys'-bootstrap analysis to provide a uniform probability-based criterion for comparing the results from diverse algorithms. To examine the usefulness of the method, the origin of the eukaryotes has been investigated by the analysis of ribosomal small subunit RNA sequences. Three common algorithms (paralinear distances, Jukes-Cantor distances, and Kimura distances) support the eocyte topology, whereas one (maximum parsimony) supports the archaebacterial topology, suggesting that the eocyte prokaryotes are the closest prokaryotic relatives of the eukaryotes. Determining globally optimal, multitaxon phylogenetic trees is computationally intensive because the number of possible trees increases rapidly with increasing taxa. For four taxa, 3 unrooted trees must be compared, whereas for thirteen, 13,749,310,575 must be compared (1). Evaluating multitaxon trees derived by different methods is further complicated by diverse optimality criteria. For example, distance methods frequently search for local minima by using least-squares criteria, whereas parsimony methods minimize the number of nucleotide changes, often using global searches (2). Currently no common basis exists for reconstructing trees by using different algorithms. Bayesian and likelihood methods assess the probabilities of trees and thereby can provide a common basis for reconstructing trees by using different algorithms. Sinsheimer et al. (3) developed a method for calculating the probability of trees derived by evolutionary parsimony, but the calculations are complex for trees with more than five taxa. Felsenstein (4) has thoughtfully proposed that bootstrap replicates (5, 6) might provide a good method of assessing the likelihood function in tree reconstruction. Both groups calculate the probability, P(treejlS), that the jth tree is correct given aligned sequences, S. These are complex calculations. In this paper one calculates something simpler-the probability, P(HjIS), that algorithm A applied to a sequence of infinite length (generated under the same model as S) would yield thejth tree. Under a multinomial model (assuming a Jeffreys' prior probability on the underlying parameters) the integral for calculating P(HAIS) can be estimated by bootstrap replication. Bootstrappers Gambitt combines this bootstrap with a multitaxon algorithm for any four-taxon method. AN EXAMPLE Bootstrappers Gambit functions by decomposing multiple taxon trees into sets of four taxon statements as illustrated in Fig. 1 for a five-taxon tree. Five aligned sequences at the top of the figure corre'spond to taxa 1 through 5. Four bootstrap replicates of the original sequences of the five aligned sequences shown at the top of Fig. 1 were taken by sampling with replacement. Maximum parsimony is used to analyze taxa four at a time, using the neighbors-or for distances, the weak neighbors-relationship (7). For four taxa (i, j, k, and 1) three trees are possible (the E tree clusters i with j and k with 1; the F tree clusters i with k; and the G tree clusters i with 1). For example, in the first column of replicate 1 the quartet represented by taxa 1, 2, 3, and 4 (denoted 1234) corresponds to the sequence pattern AAAA. Since this pattern supports no tree, by parsimony, the result is indicated by a blank (-) in the table of quartet values for replicate 1. In the second column the sequences for quartet 1234 are TTCC. Parsimony interprets this pattern as support for the E tree (8) and an e is entered in the quartet value table. The most parsimonious four-taxon trees are then chosen by counting es, fs, and gs at all sequence positions. The four-taxon trees supported at the most positions are entered into the quartet value table. (If two trees tie, then no tree is selected.) For replicate 1 the pattern of winning four-taxon tree values is EEEEE (quartets 1234, 1235, 1245, 1345, and 2345, respectively). This value pattern is uniquely associated with the tree shown next to the pattern. Some quartet value patterns are inconsistent with trees and may support non-tree graphs (7). For example the pattern from replicate 2, GEEFE, fits no tree. Details of Gambit, used to relate value patterns to trees, are described in Appendix. The last step involves calculating the probability of each tree. The conditional probability that a particular tree would be supported with infinite data is given by the number of replicates supporting the tree divided by the total number of replicates supporting trees (see Appendix). In the example two trees corresponding to the EEEEE pattern are present and the total number of trees is three, so that the probability of the EEEEE tree is estimated as 2/3 and the probability of the GEFFF tree is 1/3. Better estimates can be provided by taking more replicates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Clades, clans, and reciprocal monophyly under neutral evolutionary models.

The Yule model and the coalescent model are two neutral stochastic models for generating trees in phylogenetics and population genetics, respectively. Although these models are quite different, they lead to identical distributions concerning the probability that pre-specified groups of taxa form monophyletic groups (clades) in the tree. We extend earlier work to derive exact formulae for the pr...

متن کامل

The eccentric connectivity index of bucket recursive trees

If $G$ is a connected graph with vertex set $V$, then the eccentric connectivity index of $G$, $xi^c(G)$, is defined as $sum_{vin V(G)}deg(v)ecc(v)$ where $deg(v)$ is the degree of a vertex $v$ and $ecc(v)$ is its eccentricity. In this paper we show some convergence in probability and an asymptotic normality based on this index in random bucket recursive trees.

متن کامل

Algorithms for Computing the Quartet Distance

Evolutionary (Phylogenetic) trees are constructs of the biological and medical sciences, their purpose is to establish the relationship between a set of species (phyla). Often it is the case that the true evolutionary tree is unknown and one can only try to estimate it. Reconstruction methods are manifold and the resulting evolutionary trees are not guaranteed to be correct. In order to establi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005